class: center, middle, inverse, title-slide .title[ # Normal Distributions and Rescaling ] .author[ ### S. Mason Garrison ] --- layout: true <div class="my-footer"> <span> <a href="https://psychmethods.github.io/coursenotes/" target="_blank">Methods in Psychological Research</a> </span> </div> --- class: middle # Normal Distribution --- ## Normal Distribution Def: a particular bell-shaped curve that has the following mathematical properties `\(f(x)= \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^{2}}\)` - Formula has two parameters - `\(\mu\)` - `\(\sigma\)` - The standard normal `\((\mu=0; \sigma=1)\)` simplifies the equation --- # Standard Normal with multiple means - The mean is located at the center of the symmetric curve and is the same as the median. - Changing `\(\mu\)` without changing `\(\sigma\)` moves the Normal curve along the horizontal axis without changing its variability. .small[ <img src="data:image/png;base64,#rnorms_files/figure-html/norm-1.png" width="55%" style="display: block; margin: auto;" /> ] --- .small[ ``` r #### Normal Distribution # Display the normal distributions with various means x <- seq(-80, 80, length=1000) hx <- dnorm(x) colors <- c("red", "blue","green", "green", "gold", "black") plot(x, hx, type="l", lty=2, xlab="x value", ylab="Density", main="Comparison of Normal Distributions",xlim=c(-5, 7)) location<-c(2,4,-2) for (i in 1:3){ lines(x, dnorm(x,mean=location[i]), lwd=1, col=colors[i]) } ``` <img src="data:image/png;base64,#rnorms_files/figure-html/unnamed-chunk-2-1.png" width="40%" style="display: block; margin: auto;" /> ] --- # Plots of the Standard Normal with multiple standard deviations .pull-left[ - The standard deviation `\(\sigma\)` controls the variability of a Normal curve. - When the standard deviation is larger, the area under the normal curve is less concentrated about the mean. - The standard deviation is the distance from the center to the change-of-curvature points on either side. ] .pull-right.small[ <img src="data:image/png;base64,#rnorms_files/figure-html/zsd-1.png" width="90%" style="display: block; margin: auto;" /> ] --- .small[ ``` r # Display the normal distributions with various standard deviations plot(x, hx, type="l", lty=2, xlab="x value", ylab = "Density", main="Comparison of Normal Distributions",xlim=c(-10, 10)) for (i in c(.5,2,4,6)){ lines(x, dnorm(x,sd=i), lwd=1, col=colors[i]) } ``` <img src="data:image/png;base64,#rnorms_files/figure-html/unnamed-chunk-3-1.png" width="40%" style="display: block; margin: auto;" /> ] --- # Normal Distribution .pull-left[ - In the Normal distribution, with mean `\(\mu\)` and standard deviation `\(\sigma\)`: - approximately `\(68\%\)` of the observations fall within 1 `\(\sigma\)` of `\(\mu\)` - approximately `\(95\%\)` of the observations fall within 2 `\(\sigma\)` of `\(\mu\)` - approximately `\(99.7\%\)` of the observations fall within 3 `\(\sigma\)` of `\(\mu\)` - This property is sometimes called: The `68-95-99.7 Rule` ] .pull-right[ <img src="data:image/png;base64,#../img/normal.png" width="95%" style="display: block; margin: auto;" /> ] --- <img src="data:image/png;base64,#../img/normal.png" width="70%" style="display: block; margin: auto;" /> --- # Worked Example .pull-left[ - The distribution of Iowa Test of Basic Skills (ITBS) vocabulary scores for seventh-grade students in Gary, Indiana, is close to Normal. - Suppose the distribution is `N(6.84, 1.55)`. ] -- .pull-right[ .question[ - Q. What is the mean of the distribution? - Q. What is the standard deviation of the distribution?] ] -- <img src="data:image/png;base64,#../img/norms.png" width="55%" style="display: block; margin: auto;" /> --- # Worked Example .pull-left[ - The distribution of Iowa Test of Basic Skills (ITBS) vocabulary scores for seventh-grade students in Gary, Indiana, is close to Normal. - Suppose the distribution is `N(6.84, 1.55)`. ] -- .pull-right[ - Sketch the Normal density curve for this distribution. - Q. What percent of ITBS scores is between 3.74 and 9.94? - Q. What percent of the scores is below 3.74? ] -- <img src="data:image/png;base64,#../img/norms.png" width="55%" style="display: block; margin: auto;" /> --- .question[Check your understanding: What percent of the scores is above 5.29?] <br> <img src="data:image/png;base64,#../img/norms.png" width="100%" style="display: block; margin: auto;" /> --- # Standard Normal - Normal is a model of the real world - Not exact, but it is a facile model for many things - Physical features - Psychological features - Performance measures -- - Not all variables are normal - Skewed variables (e.g. income) - Any count variable (number of kids, mistakes on an exam) --- # Real World Data .pull-left[ - Many variables follow this distribution ( but not all) - I have plotted histograms of data, - we have already used in this class - overlaid with the standard normal. ] -- .small.pull-right[ <img src="data:image/png;base64,#rnorms_files/figure-html/example-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Height of Children (Galton dataset) .small[ ``` r library(HistData) library(ggplot2) library(tidyverse) Galton %>% ggplot(aes(x = child)) + geom_histogram(fill = "red") + stat_function( fun = function(x, mean, sd, n){ n * dnorm(x = x, mean = mean, sd = sd) }, args = with(Galton, c(mean = mean(child), sd = sd(child), n = length(child))) ) + scale_x_continuous("Heights of Children") + theme_minimal() ``` <img src="data:image/png;base64,#rnorms_files/figure-html/unnamed-chunk-9-1.png" width="40%" style="display: block; margin: auto;" /> ] --- # IMBD movie ratings (movies dataset) .pull-left.midi[ ``` r library(ggplot2movies) data(movies) ggmovie <- ggplot(movies, aes(x = rating)) + geom_histogram(fill = "blue") + geom_freqpoly(aes( x = rnorm(length(rating))*sd(rating) + mean(rating)), fill = "black") + scale_x_continuous("IMBD Movie Ratings") + theme_minimal() ``` ] .pull-right[ <img src="data:image/png;base64,#rnorms_files/figure-html/unnamed-chunk-10-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Temperature in Nottingham (nottem dataset) <img src="data:image/png;base64,#rnorms_files/figure-html/nottem-1.png" width="70%" style="display: block; margin: auto;" /> --- # Standard Normal - Normal distribution tricks - Symmetric - 50% of area above zero - Total proportion is 1.0 (or 100%) --- # Area under the Normal Distribution <img src="data:image/png;base64,#../img/normal.png" width="60%" style="display: block; margin: auto;" /> --- class: middle # Wrapping Up... --- class: middle # Rescaling --- # Rescaling - All Normal distributions are the same - if we measure in units of size `\(\sigma\)` from the mean `\(\mu\)` as center. - We can convert any variable into the same metric as the standard normal - Changing to these units is called standardizing or rescaling. --- # Converting Formulas .pull-left[ - Statistics Sample - `\(z_{i}\)` = `\(\frac{x_{i}-\bar{x}}{s}\)` ] -- .pull-right[ - Population - `\(z_{i}\)` = `\(\frac{x_{i}-\mu}{\sigma}\)` ] --- # Merits .pull-left[ - Advantages - Allows us to compare scores on a common metric - Origin is 0. The mean - The units are 1, the standard deviation - '+' values above the mean - '-' values below the mean ] -- .pull-right[ - We can compare across measurement scales - Shape of the distribution does NOT CHANGE - We can go from z-scores to raw scores ] --- # Demo .pull-left[ .small[ ``` r library(ggplot2movies) # Raw data movies$rating[1:10] ``` ``` ## [1] 6.4 6.0 8.2 8.2 3.4 4.3 5.3 6.7 6.6 6.0 ``` ``` r # Rescaling variable <- movies$rating scale(variable)[1:10] ``` ``` ## [1] 0.30079877 0.04323788 1.45982279 1.45982279 -1.63090793 ## [6] -1.05139592 -0.40749368 0.49396944 0.42957922 0.04323788 ``` ] - Mean = 5.93 (SD = 1.55) ] -- .pull-right[ .midi[ | Raw| Z_Score| |---:|-------:| | 6.4| 0.30| | 6.0| 0.04| | 8.2| 1.46| | 8.2| 1.46| | 3.4| -1.63| | 4.3| -1.05| | 5.3| -0.41| | 6.7| 0.49| | 6.6| 0.43| | 6.0| 0.04| ] ] --- # Demo .pull-left[ ``` r plot(density(variable)) # no scaling ``` <img src="data:image/png;base64,#rnorms_files/figure-html/unnamed-chunk-14-1.png" width="90%" style="display: block; margin: auto;" /> ] .pull-right[ ``` r plot(density(scale(variable))) # with scaling ``` <img src="data:image/png;base64,#rnorms_files/figure-html/unnamed-chunk-15-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Worked Z-Score Problem - Here are the IQ test scores of 31 7th-grade girls in a Midwest school district. <br> <img src="data:image/png;base64,#../img/iqz.png" width="70%" style="display: block; margin: auto;" /> --- # Worked Z-Score Problem A) We expect IQ scores to be approximately Normal. -- - Make a stem plot to check that there are no major departures from normality. -- <img src="data:image/png;base64,#../img/iqstem.png" width="65%" style="display: block; margin: auto;" /> --- # Worked Z-Score Problem B) Find the mean and standard deviation -- - Mean =105.84 = `\(\sum \frac{X_{i}}{n}\)` = 3281/31 - SD = 14.27 = `\(s^{2}\)` = `\(\frac{\sum^{n}_{i=1}(x_{i}-\bar{x})^{2}}{n-1}\)` = `\(s^{2}\)` = `\(\frac{\sum^{n}_{i=1}(x_{i}-105.84)^{2}}{30}\)` --- # Worked Z-Score Problem C) What proportion of scores are within one standard deviation of the mean? - One SD above mean = 105.84 + 14.27 = 120.11 - One SD below mean = 105.84 - 14.27 = 91.57 - 23/31 = 0.74 -- <img src="data:image/png;base64,#../img/workz.png" width="65%" style="display: block; margin: auto;" /> --- # Worked Z-score Problem B) What proportion of scores are within TWO standard deviations of the mean? - TWO SD above mean = 105.84 + 2*(14.27) = 134.38 - TWO SD below mean = 105.84 - 2*(14.27) = 77.3 - 29/31 = 0.935 --- # Worked Z-score Problem B) What would these proportions be in an exactly Normal distribution? - +/- One SD? <img src="data:image/png;base64,#../img/table.png" width="90%" style="display: block; margin: auto;" /> --- class:middle # Continued in Power Point...